Robust Speech Recognition via Anchor Word Representations

نویسندگان

Brian King

I-Fan Chen

Yonatan Vaizman

Yuzong Liu

Roland Maas

Sree Hari Krishnan Parthasarathi

Björn Hoffmeister

چکیده

A challenge for speech recognition for voice-controlled household devices, like the Amazon Echo or Google Home, is robustness against interfering background speech. Formulated as a far-field speech recognition problem, another person or media device in proximity can produce background speech that can interfere with the device-directed speech. We expand on our previous work on device-directed speech detection in the far-field speech setting and introduce two approaches for robust acoustic modeling. Both methods are based on the idea of using an anchor word taken from the device directed speech. Our first method employs a simple yet effective normalization of the acoustic features by subtracting the mean derived over the anchor word. The second method utilizes an encoder network projecting the anchor word onto a fixed-size embedding, which serves as an additional input to the acoustic model. The encoder network and acoustic model are jointly trained. Results on an in-house dataset reveal that, in the presence of background speech, the proposed approaches can achieve up to 35% relative word error rate reduction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Headstart for speech segmentation: a neural signature for the anchor word effect.

Learning a new language is an incremental process that builds upon previously acquired information. To shed light on the mechanisms of this incremental process, we studied the on-line neurophysiological correlates of the so-called anchor word effect where newly learned words facilitate segmentation of novel words from continuous speech. Higher segmentation performance was observed for speech st...

متن کامل

Combining acoustic and articulatory feature information for robust speech recognition

The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...

متن کامل

Continuous speech recognition using phone-based anchor point detection and diphone-based dp-matching

This paper deals with acoustic-phonetic decoding for CSR. There are two different processing modules depending on the stea.dy or transient nature of the speech input. First the stea.dy state speech processing module, called phone-based anchor point detection, performs some preprocessing a.llowing the selection of only a subset of the vocabulary under consideration. Secondly a general processing...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Robust Speech Recognition via Anchor Word Representations

نویسندگان

چکیده

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Improving the performance of MFCC for Persian robust speech recognition

Headstart for speech segmentation: a neural signature for the anchor word effect.

Combining acoustic and articulatory feature information for robust speech recognition

Continuous speech recognition using phone-based anchor point detection and diphone-based dp-matching

عنوان ژورنال:

اشتراک گذاری